class: center, middle, inverse, title-slide .title[ # APEC8211: Recitation 2 ] .author[ ### Shunkei Kakimoto ] --- class: middle <style type="text/css"> .small-code .remark-code{ font-size: 60% } .medium-code .remark-code{ font-size: 80% } .xlarge { font-size: 150% } .large { font-size: 130% } .medium { font-size: 80% } .small { font-size: 70% } .xsmall { font-size: 50% } .my-one-page-font { font-size: 30px; } .remark-slide-number { display: none; } .remark-slide-content.hljs-github h1 { margin-top: 5px; margin-bottom: 25px; } .remark-slide-content.hljs-github { padding-top: 10px; padding-left: 30px; padding-right: 30px; } .panel-tabs { <!-- color: #062A00; --> color: #841F27; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; padding-bottom: 0px; } .panel-tab { margin-top: 0px; margin-bottom: 0px; margin-left: 3px; margin-right: 3px; padding-top: 0px; padding-bottom: 0px; } .panelset .panel-tabs .panel-tab { min-height: 40px; } .remark-slide th { border-bottom: 1px solid #ddd; } .remark-slide thead { border-bottom: 0px; } .gt_footnote { padding: 2px; } .remark-slide table { border-collapse: collapse; } .remark-slide tbody { border-bottom: 2px solid #666; } .important { background-color: lightpink; border: 2px solid blue; font-weight: bold; } .remark-code { display: block; overflow-x: auto; padding: .5em; background: #ffe7e7; } .remark-code, .remark-inline-code { font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;font-size: 90%; } .hljs-github .hljs { background: #f2f2fd; } .remark-inline-code { padding-top: 0px; padding-bottom: 0px; background-color: #e6e6e6; } .r.hljs.remark-code.remark-inline-code{ font-size: 0.9em } .left-full { width: 80%; float: left; } .left-code { width: 38%; height: 92%; float: left; } .right-plot { width: 60%; float: right; padding-left: 1%; } .left6 { width: 60%; height: 92%; float: left; } .left5 { width: 49%; <!-- height: 92%; --> float: left; } .right5 { width: 49%; float: right; padding-left: 1%; } .right4 { width: 39%; float: right; padding-left: 1%; } .left3 { width: 29%; height: 92%; float: left; } .right7 { width: 69%; float: right; padding-left: 1%; } .left4 { width: 38%; float: left; } .right6 { width: 60%; float: right; padding-left: 1%; } ul li{ margin: 7px; } ul, li{ margin-left: 15px; padding-left: 0px; } ol li{ margin: 7px; } ol, li{ margin-left: 15px; padding-left: 0px; } </style> <style type="text/css"> .content-box { box-sizing: border-box; background-color: #e2e2e2; } .content-box-blue, .content-box-gray, .content-box-grey, .content-box-army, .content-box-green, .content-box-purple, .content-box-red, .content-box-yellow { box-sizing: border-box; border-radius: 5px; margin: 0 0 10px; overflow: hidden; padding: 0px 5px 0px 5px; width: 100%; } .content-box-blue { background-color: #F0F8FF; } .content-box-gray { background-color: #e2e2e2; } .content-box-grey { background-color: #F5F5F5; } .content-box-army { background-color: #737a36; } .content-box-green { background-color: #d9edc2; } .content-box-purple { background-color: #e2e2f9; } .content-box-red { background-color: #ffcccc; } .content-box-yellow { background-color: #fef5c4; } .content-box-blue .remark-inline-code, .content-box-blue .remark-inline-code, .content-box-gray .remark-inline-code, .content-box-grey .remark-inline-code, .content-box-army .remark-inline-code, .content-box-green .remark-inline-code, .content-box-purple .remark-inline-code, .content-box-red .remark-inline-code, .content-box-yellow .remark-inline-code { background: none; } .full-width { display: flex; width: 100%; flex: 1 1 auto; } </style> <style type="text/css"> blockquote, .blockquote { display: block; margin-top: 0.1em; margin-bottom: 0.2em; margin-left: 5px; margin-right: 5px; border-left: solid 10px #0148A4; border-top: solid 2px #0148A4; border-bottom: solid 2px #0148A4; border-right: solid 2px #0148A4; box-shadow: 0 0 6px rgba(0,0,0,0.5); /* background-color: #e64626; */ color: #e64626; padding: 0.5em; -moz-border-radius: 5px; -webkit-border-radius: 5px; } .blockquote p { margin-top: 0px; margin-bottom: 5px; } .blockquote > h1:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h2:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h3:first-of-type { margin-top: 0px; margin-bottom: 5px; } .blockquote > h4:first-of-type { margin-top: 0px; margin-bottom: 5px; } .text-shadow { text-shadow: 0 0 4px #424242; } </style> <style type="text/css"> /****************** * Slide scrolling * (non-functional) * not sure if it is a good idea anyway slides > slide { overflow: scroll; padding: 5px 40px; } .scrollable-slide .remark-slide { height: 400px; overflow: scroll !important; } ******************/ .scroll-box-8 { height:8em; overflow-y: scroll; } .scroll-box-10 { height:10em; overflow-y: scroll; } .scroll-box-12 { height:12em; overflow-y: scroll; } .scroll-box-14 { height:14em; overflow-y: scroll; } .scroll-box-16 { height:16em; overflow-y: scroll; } .scroll-box-18 { height:18em; overflow-y: scroll; } .scroll-box-20 { height:20em; overflow-y: scroll; } .scroll-box-24 { height:24em; overflow-y: scroll; } .scroll-box-30 { height:30em; overflow-y: scroll; } .scroll-output { height: 90%; overflow-y: scroll; } </style> # Outline Review some concepts related to random variables <!-- # main --> [1. CDF, PDF, PMF (Quick review)](#dist) + [Exercise problem 1](#ex1) + [Exercise problem 2 (optional)](#ex2) <!-- # To explain Jensen's inequality --> [2. Mean and variance and covariance(Quick review)](#mean) + [Exercise problem 3](#ex3) + [Exercise problems 4 (optional)](#ex4) [3. Introduction of Monte Calro Simulation](#monte) [Supplement: Jensen's inequality (Quick review)](#jensen) --- class: inverse, center, middle name: dist # CDF, PDF, and PMF <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> --- .content-box-red[**Distribution function**] + Cumulative distribution function (CDF) + Definition: <span style="color:red">The CDF of a random variable `\(X\)` is `\(F(x) = Pr[X \leq x]\)`</span> + **Verbally**: CDF `\(F(x)\)` tells us the probability of the event that random variable `\(X\)` is <span style="color:red">less</span> than a value `\(x\)`. .left5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-5-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> ] --- <!-- PDF and PMF (1) --> .content-box-red[**Probability mass function (Discrete random variables)**] + **Definition**: `\(\color{red}{\pi(x) = Pr[X = x]}\)` + **Verbally**: The probability that `\(X\)` equals the value `\(x\)` <br> .content-box-red[**Probability density function (Continuous random variables)**] + **Definition**: `\(\color{red}{f(x) = \frac{d}{dx}F(x)} \quad ( = \displaystyle \lim_{h\to\infty} \frac{F(x+h)-F(x)}{h})\)` + **Verbally**: Density function is defined as a very small change in the CDF (or the probability of the random variable falling within a particular range of values according to [wikipedia](https://en.wikipedia.org/wiki/Probability_density_function)). --- <!-- PDF and PMF (2) --> .content-box-red[**Probability mass function (Discrete random variables)**] + **Definition**: `\(\color{red}{\pi(x) = Pr[X = x]}\)` + **Verbally**: The probability that `\(X\)` equals the value `\(x\)` <br> .content-box-red[**Probability density function (Continuous random variables)**] + **Definition**: `\(\color{red}{f(x) = \frac{d}{dx}F(x)} \quad ( = \displaystyle \lim_{h\to\infty} \frac{F(x+h)-F(x)}{h})\)` + **Verbally**: Density function is a very small change in the CDF (or the probability of the random variable falling within a particular range of values according to [wikipedia](https://en.wikipedia.org/wiki/Probability_density_function)). <br> .content-box-red[**Theorem 2.3: Properties of a PDF**] A function f(x) is a density function **if and only if** `$$\begin{cases} f(x) \ge 0 \text{ for all } x \\ \int_{-\infty}^\infty f(x)\,dx = 1 \end{cases}$$` + You can use this condition to check whether a function is valid density function is or not! <!-- if you are asked to show that a function f(x) is a valid density function, check whether f(x) satisfies these properties or not. --> --- class: middle .content-box-green[**Relationship between CDF and PDF**] + **From CDF to PDF**: `\(f(x) = \frac{d}{dx}F(x)\)` </br> (by definition of PDF) + **From PDF to CDF**: `\(F(x) = Pr(X \leq x) = \int_{-\infty}^x f(t) dt\)` </br> (as shown below) <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-24-1.gif" width="80%" style="display: block; margin: auto;" /> --- name: ex1 # Exercise 1 .content-box-green[**Final Exam: 2021: Problem 1**] Define `\(\Phi(z)\)` as the CDF of a standard normal random variable and `\(\phi(z)\)` as its density function. (a) Write `\(Pr(Z \leq b)\)` using `\(\Phi()\)`. (b) Write `\(Pr(Z \leq b)\)` as an integral. (c) Write `\(Pr(a \leq Z \leq b)\)` using `\(\Phi()\)`. (d) Write `\(Pr(a \leq Z \leq b)\)` as an integral. --- # Exercise 2 .content-box-green[**PSE Exercise 2.1**] Let `\(X \sim U[0,1]\)`. Find the PDF of random variable `\(Y=X^2\)`. --- class: inverse, center, middle name: mean # Mean and Variance and Covariance <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> --- .content-box-red[**Mean and Variance**] **Definition** 2.18, 2.19: + The mean of `\(X\)` is<span style='color:red'> `\(E[X]\)`</span> + The variance of `\(X\)` is <span style='color:red'> `\(Var[X]=E[(X-E[X])^2]\)`</span> `\(= E[X^2] - (E[X])^2\)` .left5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ] .right5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] --- .content-box-red[**Covariance**] **Definition** $$ \color{red}{Cov(X, Y) = E[(X-E[X])((Y-E[Y]))]} = E[XY] - E[X][Y] $$ **Verbally**: Covariance measure the joint variability of two random variables. + .content-box-green[Visualization] <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-12-1.png" width="100%" style="display: block; margin: auto;" /> --- .content-box-red[**Covariance**] **Definition** $$ \color{red}{Cov(X, Y) = E[(X-E[X])((Y-E[Y]))]} = E[XY] - E[X][Y] $$ **Verbally**: Covariance measure the joint variability of two random variables. + .content-box-green[Visualization] <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> .content-box-red[**Correlation**] **Definition** $$ \color{red}{Corr(X, Y) = \frac{Cov(X,Y)}{\sqrt{Var[X] Var[Y]}}} $$ --- ## Play around with data .content-box-green[**Goal**] + Know some basic R functions (e.g., mean, variance, etc.) + See covariance is influenced by the change in scale but correlation is not. .small-code[ ```r # === Data === # data(airquality) # ?airquality #/*--------------------------------*/ #' ## Basic functions of R #/*--------------------------------*/ # --- histogram of Temperature (Temp) --- # hist(airquality$Temp) # frequency table can be obtained by running table(airquality$Temp) # --- Mean of Temp (degrees F)--- # mean(airquality$Temp) # --- Variance of Temp --- # var(airquality$Temp) # sd(airquality$Temp) for standard deviation # --- summary statistics of Temp --- # summary(airquality$Temp) #/*------------------------------------------*/ #' ## Relationship between Wind and Temp #/*------------------------------------------*/ plot(airquality$Wind, airquality$Temp) # === Covariance === # cov(airquality$Wind, airquality$Temp) # What happens if you change the unit of wind from mph to kmph (1mph=1.6kmph) cov(airquality$Wind*1.6, airquality$Temp) # === Correlation === # cor(airquality$Wind, airquality$Temp) # What happens if you change the unit of wind from mph to kmph (1mph=1.6kmph) cor(airquality$Wind*1.6, airquality$Temp) ``` ] --- ## E[ ], Var[ ] as operators .content-box-red[**Expectation: E[ ]**] <span style="color:red"> `\(E[ \,]\)` is a linear operator (Linearity of expectation)</span> For any constants `\(a\)` and `\(b\)`, `$$E[a+bX] = a + bE[X]$$` --- ## E[ ], Var[ ] as operators .content-box-red[**Expectation: E[ ]**] <span style="color:red"> `\(E[ \,]\)` is a linear operator (Linearity of expectation)</span> For any constants `\(a\)` and `\(b\)`, `$$E[a+bX] = a + bE[X]$$` .content-box-red[**Variance: Var[ ]**] `\(Var[ \,]\)` is <span style="color:red">not</span> a linear operator $$ Var[a+bX] = b^2E[X] $$ because `$$\begin{align*} Var[a+bX] &= E[(a+bX - E[a+bX])^2] \\ &= E[(a+bX - a-bE[X])^2] \\ &= E[(b(X - E[X]))^2] \\ &= E[b^2(X - E[X])^2] \\ &= b^2 (X - E[X])^2 \\ &= b^2 Var[X] \end{align*}$$` --- # Exercise 4 .content-box-green[**Lecture note 2, p14**] Prove these for continuous `\((X,Y)\)` with finite variances. (a). If `\(E[X]=0\)` or `\(E[Y]=0\)`, `\(Cov(X,Y)=E[XY]\)`. (b). If `\(X \perp\!\!\!\perp Y\)`, `\(corr(X,Y)=0\)`. (c). If `\(E[X] = E[Y] = 0\)`, `\(Var[X+Y] = Var[X] + Var[Y] + 2Cov(X,Y)\)` (Note: Also true if the expectations are non-zero). (d). If `\(X\)` and `\(Y\)` are uncorrelated, `\(Var[X+Y] = Var[X] + Var[Y]\)`. --- # Exercise 3 .content-box-green[**Final Exam: 2021: Problem 3**] The chi-squared distribution with `\(k\)` degrees of freedom, denoted `\(\chi^2(k)\)`, is the distribution of `\(\sum_{i=1}^k Z^2_{i}\)` and the `\(Z_i\)` are independent `\((Z_i \perp\!\!\!\perp Z_j)\)`. *You do not need to work with the CDF or density of a `\(\chi^2\)` distribution to answer this question!*. <br> (a) Show that if `\(X\)` is distributed `\(\chi^2(k)\)` then `\(E[X]=k\)`. <br> (b) More work with expectation Let `\(K=Z^2_{1} + Z^2_{2}\)`, where `\(Z_j \sim N(0,1)\)`. Then `\(K \sim \chi^2(2)\)`. Another fact is that if `\(Z \sim N(0,1)\)`, then `\(E[Z^4]=3\)`. Use that fact to show that `\(Var[K]=4\)`. [Hint: `\(E[Z_j^4]\)` is closely related to `\(Var[Z_j^2]\)`.] --- class: inverse, center, middle name: intro # Monte Calro Simulation <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> --- class: middle .content-box-red[**Monte Carlo Simulation**] + A way to test econometric theories or statistical procedures in realistic setting via simulation. --- class: middle .content-box-green[**Example: Binomial distribution (PSE 3.4)**] The binomial disrandom variables equals the outcome of `\(n\)` independent Bernoulli trials. If you flip a coin `\(n\)` times, the number of heads has a binomial distribution. <br> Theoretically, the binomial random variable has a binomial distribution with `$$\begin{align*} E[X] &= np \\ Var[X] &= np(1-p) \end{align*}$$` <br> Can we confim this with Monte Calro simulation? --- class: middle class: middle .content-box-green[**Example: Binomial distribution (PSE 3.4)**] Suppose that we flip a coin `\(n=9\)` times, and count the number of heads (i.e. `\(X\)`). The coin is not fair, `\(p=Pr[heads]=\frac{1}{3}\)`. Theoretically, + `\(E[X]=np = 9 \times \frac{1}{3} = 3\)` + `\(Var[X]=np(1-p) = 9 \times \frac{1}{3} (1 - \frac{1}{3}) = 2\)` --- class: middle ## Monte Carlo Simulaton: Steps 1. specify the data generaing process 2. generate data based on the data generating process 3. get an outcome you are insterested in based on the generated data 4. repeat step 2 and 3 many many times 5. compare your estimates with the true parameter --- Theoretically, `\(E[X]= 3\)` and `\(Var[X] = 2\)`. .medium-code[ ```r set.seed(1234) # --- Step1: Speficify the data generating process --- # p <- 1/3 n <- 9 # the number of trials # --- Step2: generate data --- # seq_x <- sample(c(1,0), size=9, prob = c(p, 1-p), replace=TRUE) seq_x ``` ``` ## [1] 0 0 0 0 1 0 0 0 0 ``` ```r # --- Step3: get an outcome you are insterested in --- # sum(seq_x) ``` ``` ## [1] 1 ``` ] --- # Step 4: For loop ``` ## [1] 2.984 ``` <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-16-1.png" width="80%" style="display: block; margin: auto;" /> --- class: inverse, center, middle name: jensen # Jensen's inequality <html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html> --- ## Motivation Linearity of expectation cannot be used when the function inside `\(E[\,]\)` is a nonlinear function. .content-box-green[**Example:**] + If `\(g(X)\)` is a linear function (e.g., `\(g(x)=ax+b\)`) * `\(E[g(X)]=g(E[X]])\)` + If `\(g(X)\)` is a nonlinear function (e.g., `\(g(x)=x^2\)`) * `\(E[g(X)] \neq g(E[X])\)` --- (This is not the proof) In the previous slide, we saw `\(Var[X]=E[(X-E[X])^2]=E[X^2] - (E[X])^2\)`. Because, `\(Var[X] \ge 0\)` (by the way, `\(Var[X]=0\)` if and only if `\(X\)` is degenerate). `$$E[X^2] - (E[X])^2 \ge 0$$` <p style="text-align: center;">or</p> `$$(E[X])^2 \leq E[X^2]$$` Define `\(g(x)=x^2\)`. Then it is written as `$$g(E[X]) \leq E[g(X)]$$` Generally, `$$\begin{align*} g(E[X]) \leq E[g(X)] \quad &\text{if } g(x) \text{ is a convex function} \\ E[g(X)] \leq g(E[X]) \quad &\text{if } g(x) \text{ is a concave function} \end{align*}$$` --- .content-box-green[**Visualization**] .panelset[ .panel[.panel-name[Example 1 : g(x) is convex] .left5[ Suppose that `\(g(x)=x^2\)`. .small-code[ ```r set.seed(356) # Create a sequence of X from a uniformal distribution x <- runif(1000, 0, 10) # /*===== Convex case: g(X)=X^2 =====*/ y <- x^2 figure_ex1 <- ggplot()+ geom_point(aes(x = x, y = y))+ # --- E[X] --- # geom_vline(xintercept = mean(x), color = "red", linetype = "dashed")+ annotate("text", x = mean(x)+1, y = 0.01, label = paste0("E[X]=", round(mean(x), 1)), size = 3, color = "red") + # --- Add horizontal line for --- # geom_hline(yintercept = mean(y), color="blue", linetype = "dashed")+ annotate("text", x = 1, y = mean(y)+5, label = paste0("E[g(X)]=", round(mean(y), 1)), size = 3, color = "blue") + # --- Add horizontal line for g(E[X]) --- # geom_hline(yintercept = mean(x)^2, color="darkgreen", linetype = "dashed")+ annotate("text", x = 1, y = mean(x)^2-5, label = paste0("g(E(X))=", round(mean(x)^2, 1)), size = 3, color = "darkgreen") + theme_bw() ``` ] ] .right5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> `$$\color{darkgreen}{g(E[X])} \leq \color{blue}{E[g(X)]}$$` ] ] .panel[.panel-name[Example 2: g(x) is concave] .left5[ Suppose that `\(g(x)=\sqrt{x}\)`. .small-code[ ```r # /*===== Convex case: g(X)=X^(1/2) =====*/ y <- x^(1/2) figure_ex2 <- ggplot()+ geom_point(aes(x = x, y = y))+ # --- E[X] --- # geom_vline(xintercept = mean(x), color = "red", linetype = "dashed")+ annotate("text", x = mean(x)+0.8, y = 0.01, label = paste0("E[X]=", round(mean(x), 1)), size = 3, color = "red") + # --- E[g(X)] --- # geom_hline(yintercept = mean(y), color = "blue", linetype = "dashed")+ annotate("text", x = 1, y = mean(y)-0.2, label = paste0("E[g(X)]=", round(mean(y), 2)), size = 3, color = "blue") + # --- g(E[X]) --- # geom_hline(yintercept = mean(x)^(1/2), color = "darkgreen", linetype = "dashed")+ annotate("text", x = 1, y = mean(x)^(1/2)+0.2, label = paste0("g(E(X))=", round(mean(x)^(1/2), 2)), size = 3, color = "darkgreen") + theme_bw() ``` ] ] .right5[ <img src="data:image/png;base64,#recitation2_slides_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> `$$\color{blue}{E[g(X)]} \leq \color{darkgreen}{g(E[X])}$$` ] ] ] --- --- Question: The problem sets in the assignment covers the materials from the previous week. **Question: Should I cover the material we learned from this week, the previous week or the mix of the two?** <br> **option (1): lab covers the material from this week** .content-box-green[**pro:**] The material in the lab is the review of the class on this week and it will be helpful for the next assignment. .content-box-red[**con:**] You may not have time to review the textbook or class-notes, so the lab would be boring. <br> **option (2): lab covers the material from the previous week** .content-box-green[**pro:**] You have enough time to review the textbook and class-note, so it is more likely to understand the lab .content-box-red[**con:**]